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DETAILED ACTION 

Claim Rejections - 35 USC § 103 

1 . The following is a quotation of 35 U.S.C. 1 03(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 102 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 

2. Claims 1 , 4, and 1 1 are rejected under 35 U.S.C. 1 03(a) as being unpatentable 
over Mitchell et al. (U.S. Patent 5,799,273), in view of Hon et al. (U.S. Patent 
6,490,563). 

In regard to claim 1 , Mitchell et al. disclose a method for error detection within 
text transcribed from a first speech signal by an automatic speech-to-text transcription 
system (speech recognized during a dictation step, column 8, lines 52-57), comprising 
providing the first speech signal and the transcribed text outputs for a comparison 
between first speech signal and transcribed text for an identification of potential errors in 
the text (the dictated audio data is stored along with the transcribed text, so that a user 
can later compare the dictated audio and transcribed text to identify potential errors, 
column 1 0, line 52 to column 1 1 , line 6). 

Mitchell et al. differs from the claimed invention by substitution of generating a 
second speech signal from the transcribed text and comparing the first speech signal 
and the second speech signal for comparing the first speech signal and the text directly. 

Hon et al. disclose a method for error detection within transcribed text comprising 
generating a second speech signal from the transcribed text for detecting potential 
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errors (a speech recognition system converts input speech to text, which is then 
converted back to speech using TTS, column 7, lines 18-22). 

One of ordinary skill in the art at the time of invention could have substituted 
comparing the first speech signal to the second generated speech signal instead of 
comparing the first speech signal directly to the text and the result would have 
predictably allowed the user to "proofread" text using only audio (suggested as 
advantageous by Hon et al. at column 6, lines 2-5). 

Thus, it would have been obvious to one of ordinary skill in the art at the time of 
invention to modify Mitchell et al. to generate a second speech signal from the 
transcribed text and compare the first speech signal and the second speech signal to 
identify potential errors in the text. 

In regard to claim 4, Mitchell et al. do not disclose generating a second speech 

signal. 

Hon et al. disclose a method for error detection within transcribed text comprising 
generating a second speech signal by applying an inverse speech transcription process, 
generating a feature vector sequence from the text, using (a) statistical models of the 
speech-to-text transcription system and (b) a state sequence obtained in the process of 
transcription of the text from the first speech signal (HMM models are used to both 
generate the speech, column 6, lines 41-57; as well as recognize the input speech, 
column 8, lines 43-57). 
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It would have been obvious to one of ordinary skill in the art at the time of 
invention to modify Mitchell et al. to generate a second speech signal by applying an 
inverse speech transcription process, generating a feature vector sequence from the 
text, using (a) statistical models of the speech-to-text transcription system and (b) a 
state sequence obtained in the process of transcription of the text from the first speech 
signal, because it would make the first speech signal and the second speech signal 
match more closely, thus the two signals would easier to listen to simultaneously for the 
user. 

In regard to claim 1 1 , Mitchell et al. disclose an error detection system for a 
speech-to-text transcription system providing a transcribed text (412) from a first speech 
signal (400) (speech recognized during a dictation step, column 8, lines 52-57), the 
error detection system comprising: 

means for providing first (400, 418) speech signal and transcribed text for 
comparison between first speech signal and transcribed text for an identification of 
potential errors in the text (412) (the dictated audio data is stored along with the 
transcribed text, so that a user can later compare the dictated audio and transcribed text 
to identify potential errors, column 10, line 52 to column 1 1 , line 6). 

Mitchell et al. differs from the claimed invention by substitution of means for 
generating a second speech signal from the transcribed text and comparing the first 
speech signal and the second speech signal for comparing the first speech signal and 
the text directly. 
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Hon et al. disclose a system for error detection within transcribed text comprising 
means for generating a second speech signal from the transcribed text for detecting 
potential errors (a speech recognition system converts input speech to text, which is 
then converted back to speech using TTS, column 7, lines 18-22). 

One of ordinary skill in the art at the time of invention could have substituted 
comparing the first speech signal to the second generated speech signal instead of 
comparing the first speech signal directly to the text and the result would have 
predictably allowed the user to "proofread" text using only audio (suggested as 
advantageous by Hon et al. at column 6, lines 2-5). 

Thus, it would have been obvious to one of ordinary skill in the art at the time of 
invention to modify Mitchell et al. to generate a second speech signal from the 
transcribed text and compare the first speech signal and the second speech signal to 
identify potential errors in the text. 

3. Claims 2, 3, 5, 6, 12, 13, and 16-18 are rejected under 35 U.S.C. 103(a) as being 
unpatentable over Mitchell et al., in view of Hon et al., and further in view of Yamazaki 
(U.S. Patent 6,088,674). 

In regard to claim 2, Mitchell et al. and Hon et al. do not disclose the speed 
and/or the volume of the second speech signal matches the speed and/or the volume of 
the first speech signal. 
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Yamazaki et al. disclose a method for comparing a first speech signal to a 
second speech signal generated from the text transcribed from the first speech signal, 
wherein: 

the speed and/or the volume of the second speech signal matches the speed 
and/or the volume of the first speech signal (speech is input to a speech recognition 
section, column 28, lines 6-9; the transcription of which is then used to generate a 
synthetic speech signal, lines 10-27; the amplitudes of the two waveforms are then 
matched, lines 54-60). 

It would have been obvious to one of ordinary skill in the art at the time of 
invention to further modify Mitchell et al. and Hon et al. to match the speed and/or 
volume of the second speech signal to the first signal, because it would make the two 
signals easier to listen to simultaneously for the user. 

In regard to claim 3, Mitchell et al. and Hon et al. do not disclose a set of filter 
functions is applied to the first speech signal to approximate the spectrum of the first 
speech signal to the spectrum of the second speech signal. 

Yamazaki et al. disclose a method for comparing a first speech signal to a 
second speech signal generated from the text transcribed from the first speech signal, 
wherein: 

a set of filter functions is applied to the first speech signal to approximate the 
spectrum of the first speech signal to the spectrum of the second speech signal (the 
voice tone is adjusted to match the speech signals, column 28, lines 61-67). 
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It would have been obvious to one of ordinary skill in the art at the time of 
invention to further modify Mitchell et al. and Hon et al. to approximate the spectrum of 
the first speech signal to the spectrum of the second speech signal, because it would 
make the two signals easier to listen to simultaneously for the user. 

In regard to claims 5 and 12, Mitchell et al. and Hon et al. do not disclose a 
comparison signal is generated by subtracting or superimposing first and second 
speech signals. 

Yamazaki et al. disclose a method for comparing a first speech signal to a 
second speech signal generated from the text transcribed from the first speech signal, 
wherein: 

a comparison signal is generated by subtracting or superimposing first and 
second speech signals (see Fig. 24, the original waveform 10B and synthesized 
waveform 10C are compared, column 27, line 64 to column 28, line 5). 

It would have been obvious to one of ordinary skill in the art at the time of 
invention to further modify Mitchell et al. and Hon et al. to generate a comparison signal 
by subtracting or superimposing first and second speech signals, so a user could 
quickly visually confirm the comparison. 

In regard to claims 6 and 13, Mitchell et al. and Hon et al. do not disclose the 
comparison signal is provided acoustically and/or visually. 
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Yamazaki et al. disclose a method for comparing a first speech signal to a 
second speech signal generated from the text transcribed from the first speech signal, 
wherein: 

the comparison signal is provided acoustically and/or visually (see Fig. 24, a user 
can visually compare the original waveform 10B and synthesized waveform 10C and 
audibly compare them by using playback buttons 10F and 10G, column 27, line 64 to 
column 28, line 5). 

It would have been obvious to one of ordinary skill in the art at the time of 
invention to further modify Mitchell et al. and Hon et al. to provide the comparison signal 
acoustically and/or visually so the user could either a) quickly visually confirm the 
comparison or b) to "proofread" text using only audio. 

In regard to claim 16, Mitchell et al. disclose a computer program product for 
error detection for a speech-to-text transcription system providing a transcribed text 
from a first speech signal (speech recognized during a dictation step, column 8, lines 
52-57), the computer program product comprising program means for: 

providing first speech signal outputs for a comparison between first speech signal 
and the transcribed text (the dictated audio data is stored along with the transcribed 
text, so that a user can later compare the dictated audio and transcribed text to identify 
potential errors, column 10, line 52 to column 11, line 6). 
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Mitchell et al. differs from the claimed invention by substitution of generating a 
second speech signal from the transcribed text and comparing the first speech signal 
and the second speech signal for comparing the first speech signal and the text directly. 

Hon et al. disclose a method for error detection within transcribed text comprising 
generating a second speech signal from the transcribed text for detecting potential 
errors (a speech recognition system converts input speech to text, which is then 
converted back to speech using TTS, column 7, lines 18-22). 

One of ordinary skill in the art at the time of invention could have substituted 
comparing the first speech signal to the second generated speech signal instead of 
comparing the first speech signal directly to the text and the result would have 
predictably allowed the user to "proofread" text using only audio (suggested as 
advantageous by Hon et al. at column 6, lines 2-5). 

Thus, it would have been obvious to one of ordinary skill in the art at the time of 
invention to modify Mitchell et al. to generate a second speech signal from the 
transcribed text and compare the first speech signal and the second speech signal to 
identify potential errors in the text. 

Mitchell et al. and Hon et al. do not disclose the speed and/or the volume of the 
second speech signal matches the speed and/or the volume of the first speech signal. 

Yamazaki et al. disclose a method for comparing a first speech signal to a 
second speech signal generated from the text transcribed from the first speech signal, 
wherein: 
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the speed and/or the volume of the second speech signal matches the speed 
and/or the volume of the first speech signal (speech is input to a speech recognition 
section, column 28, lines 6-9; the transcription of which is then used to generate a 
synthetic speech signal, lines 10-27; the amplitudes of the two waveforms are then 
matched, lines 54-60). 

It would have been obvious to one of ordinary skill in the art at the time of 
invention to further modify Mitchell et al. and Hon et al. to match the speed and/or 
volume of the second speech signal to the first signal, because it would make the two 
signals easier to listen to simultaneously for the user. 

In regard to claim 17, Mitchell et al. and Hon et al. do not disclose a comparison 
signal is generated by subtracting or superimposing first and second speech signals. 

Yamazaki et al. disclose a method for comparing a first speech signal to a 
second speech signal generated from the text transcribed from the first speech signal, 
wherein: 

a comparison signal is generated by subtracting or superimposing first and 
second speech signals (see Fig. 24, the original waveform 10B and synthesized 
waveform 10C are compared, column 27, line 64 to column 28, line 5). 

It would have been obvious to one of ordinary skill in the art at the time of 
invention to further modify Mitchell et al. and Hon et al. to generate a comparison signal 
by subtracting or superimposing first and second speech signals, so a user could 
quickly visually confirm the comparison. 
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In regard to claim 18, Mitchell et al. and Hon et al. do not disclose the 
comparison signal is provided acoustically and/or visually. 

Yamazaki et al. disclose a method for comparing a first speech signal to a 
second speech signal generated from the text transcribed from the first speech signal, 
wherein: 

the comparison signal is provided acoustically and/or visually (see Fig. 24, a user 
can visually compare the original waveform 10B and synthesized waveform 10C and 
audibly compare them by using playback buttons 10F and 10G, column 27, line 64 to 
column 28, line 5). 

It would have been obvious to one of ordinary skill in the art at the time of 
invention to further modify Mitchell et al. and Hon et al. to provide the comparison signal 
acoustically and/or visually so the user could either a) quickly visually confirm the 
comparison or b) to "proofread" text using only audio. 

Allowable Subject Matter 

4. Claims 7-10, 14, 15, 19, and 20 are objected to as being dependent upon a 
rejected base claim, but would be allowable if rewritten in independent form including all 
of the limitations of the base claim and any intervening claims. 

The following is a statement of reasons for the indication of allowable subject 
matter: 
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In regard to claims 7, 14, and 19, Mitchell et al., Hon et al., and Yamazaki et al. 
do not disclose or suggest that an error signal is output when the comparison signal is 
beyond a predefined range. 

In regard to claims 9, 15, and 20, Mitchell et al., Hon et al., and Yamazaki et al. 
do not disclose or suggest detecting patterns in the comparison signal to detect a 
certain type of error in the transcribed text. 

Conclusion 

5. The prior art made of record and not relied upon is considered pertinent to 
applicant's disclosure. Runge et al. (U.S. Patent Application Publication 2006/0149546) 
is an intervening reference that anticipates the applicant's claimed invention. Hanson 
(U.S. Patent 6,064,965) discloses a method for combining original speech and 
synthesized speech for audible proofreading. Hanson (U.S. Patent 6,338,038) and 
Lewis et al. (U.S. Patent 7,010,489) disclose methods for matching the speed of 
synthesized speech to original speech. Buth et al. (U.S. Patent 6,546,369) disclose a 
method that compares synthesized speech to input speech to determine a correct 
pronunciation for the synthesized speech. 

6. Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to BRIAN L. ALBERTALLI whose telephone number is 
(571)272-7616. The examiner can normally be reached on Mon - Fri, 8:00 AM - 5:30 
PM, every second Fri off. 
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If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, David Hudspeth can be reached on (571) 272-7843. The fax phone number 
for the organization where this application or proceeding is assigned is 571-273-8300. 

Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for 
published applications may be obtained from either Private PAIR or Public PAIR. 
Status information for unpublished applications is available through Private PAIR only. 
For more information about the PAIR system, see http://pair-direct.uspto.gov. Should 
you have questions on access to the Private PAIR system, contact the Electronic 
Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a 
USPTO Customer Service Representative or access to the automated information 
system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. 

BLA 5/8/08 

/David R Hudspeth/ 

Supervisory Patent Examiner, Art Unit 2626 



